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Abstract 

Inherent structure theory is used to discover strong connections between simple characteristics of 
protein structure and the energy landscape of a Go model. The potential energies and vibrational 
free energies of inherent structures are highly correlated, and both reflect simple measures of 
networks of native contacts. These connections have important consequences for models of protein 
dynamics and thermodynamics. 
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Protein activity is controlled by dynamical transitions among conformational substates 
the transitions may be understood in terms of motions on an energy landscape 
Substates correspond to local minima in the energy landscape, and transitions correspond 
to the hurdling of barriers between minima. Interestingly, the protein energy landscape 
resembles that of glasses j^]. 

Spin-glass models have yielded insight into properties of protein energy landscapes 
and the kinetics of protein folding [6[]. The main motivation for using spin-glass models 
rather than structural-glass models is that spin-glass models are more analytically tractable; 



however, it has long 
to describe proteins 



Deen recognized that structural-glass models might be better-suited 
5]. Indeed, protein unfolding has been characterized as a rigidity 



transformation that is similar to those seen in network glasses Q. 

Structural-glass-forming liquids have been fruitfully characterized using inherent struc- 
ture (IS) theory j^l, 9], which treats the energy landscape as a set of discrete basins that are 
separated by saddles. Each basin contains a local minimum, called an inherent structure, 
which is analogous to a protein conformational substate. The dynamics are then naturally 
described as vibrations about local minima, punctuated by transitions between neighboring 
basins. A key assumption in IS theory is that vibrations are similar about minima with the 
same potential energy; however, importantly, IS theory allows for diversity among vibrations 

that have different potential energies. 

□ 

Guo and Thirumalai [10[ have used IS theory to analyze fluctuations in the neighbor- 
hood of the native state of a coar se-g rained model of a designed four-helix bundle protein. 
Baumketner, Shea, and Hiwatari [Uj have applied IS theory to study the glass transition 
in a coarse-grained model of a 16-residue polypeptide; by IS analysis of molecular dynam- 
ics trajectories, they demonstrated the ability to rigorously calculate the glass transition 
temperature due to freezing in the native-state basin. In a more recent study, Nakagawa 
& Peyrard [lij used IS theory to analyze the energy landscape of a protein G Bl domain 
using a coarse-grained model, finding that the density of minima increases exponentially 
with the energy. Importantly, their analysis relied on an assumption that vibrations are 
the same about all potential energy minima. However, because proteins become less rigid 
as noncovalent bonds are broken [7|, vibrations are expected to be different for different 
minima, especially for minima with different potential energies. Diversity in vibrations not 
only would change the density of minima, but also would have important implications for 
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the kinetics of transitions among conformational substates [l|; however, if vibrations are 
the same for different minima, their role in determining the kinetics of transitions would be 
trivial. 

To characterize the diversity in vibrations among different protein inherent structures, we 
used IS theory to analyze a coarse-grained Go model of the same protein fragment considered 
by Nakagawa & Peyrard 12]: GB1, a protein G Bl domain (Protein Data Bank [3] entry 



2GB1 



141]). GB1 has 56 amino acids and consists of a four-stranded /3-sheet packed against 



a single helix. 

A configuration x is represented by the set of C a positions, and the Go model potential 
energy U(x) for C n configurations x of all proteins is similar to that used by the GB1 studies 



in Refs. 



15| and (l2|: 
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The first term in Eq. is the contribution from neighboring backbone C a bond distances, 
the second is from angles between neighboring bonds, the third is from dihedral angles, 
the fourth is from noncovalent interactions between native contacts, and the fifth is from 
noncovalent interations between other pairs of atoms. The crystal structure was used as the 
reference structure to determine r^o, 9i,o, and r^o, with native contacts determined using a 
cutoff distance of 5.5 A. Other parameter values are K b = 200e A 2 , Kg = 40e rad~ 2 , = 
0.3eo, e = 0.18eo, and C = 4 A. The absolute energy unit eo = 1.89 Kcalmol -1 is determined 



as in Ref. 



151 ]. assuming a folding temperature of 350 K. Frustration is introduced through 



the dihedral angle terms, which are not defined with respect to the reference structure. 

Langevin dynamics simulations were performed using a time step 0.0007r and a friction 
coefficient of 0.2/r, where r = 1.47 ps (following Ref. [jj]]). The collapse temperature 
To, at which extended and collapsed configurations are approximately equally likely, was 
located by analysis of the specific heat jl2|] . A single trajectory at temperature Tg with 
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3 x 10 8 time steps was sampled every 10 4 steps to obtain an ensemble of 3 x 10 4 inherent 
structures for analysis. Local minima e a , corresponding to inherent structures a, were found 
using conjugate gradient minimization terminated when a step resulted in an energy change 
of just 10~ 12 . The protein exhibited multiple transitions between extended and collapsed 
states during the course of the simulation, and the inherent structure ensembles exhibited 
a bimodal probability distribution Pjs(e a ,Tg) of collapsed and extended inherent-structure 
potential energies e a , similar to the distribution in Ref. 12]. 

Like a previous application of IS theory to proteins by Baumketner, Shea, and Hiwatari 
111 ], we replace the configurational integral in the partition function for an isolated protein 
with a sum over contributions from individual inherent structures: 



3N 

dx e -[U(x)- eoi ]/k B T 

) 



= ^e~ [e< * +F " (Q,T)]/ ' CsT , (2) 

a 

which defines the vibrational free energy F v (a, T). In Eq. (j2J), R(a) is the basin surrounding 
inherent structure a, Aj is the thermal wavelength of atom i, and a is a factor to account 
for symmetries. 

Values of F v (a,T), calculated as differences with respect to the native structure a = 
(the same holds for values of e a ), were estimated at the collapse temperature T g using a 
harmonic approximation, 

k T 3N A (q ) 

F^T) = ^-^% (3) 

Z i=7 \ 

where X[ is the i th eigenvalue of the Hessian hj k = d 2 U / dxjdxk calculated at the energy 
minimum corresponding to inherent structure a, and A$ is the same for the ground-state 
inherent structure. The sum is over all modes with nonzero frequency: we neglect a contri- 
bution due to changes in the rotational entropy for different inherent structures. Values of 
F v are similar for inherent structures with a similar potential energy e a (Fig. [T]). The contri- 
bution to F v from the highest 1/3 of the eigenvalues does not change for different inherent 
structures. Interestingly, there is a gap in the eigenvalue spectrum between the lowest 2/3 
and the highest 1/3 of the eigenvalues; in addition, only the highest 1/3 of the eigenvalues 
change when the bond-distance force constant Kj, is increased by a factor of ten, indicating 
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FIG. 1: Vibrational free energies F v of inherent structures vs. their potential energies e a (black 
points). The dependence is well-modeled by Eq. (jH) with e c = SSAksTg (piecewise-linear red line). 
The contribution from the highest 1/3 of the eigenvalues is constant (green points). 



that the corresponding modes describe the bond vibrations. Therefore bond vibrations do 
not change significantly among different inherent structures. However, the total F v , which 
includes contributions from the lowest 2/3 of the eigenvalues, changes significantly with e a 
(Fig. [1]). The assumption of constant F v by Nakagawa & Peyrard [l2j therefore is only valid 
for the modes that involve bond vibrations. This result is consistent with studies of the 
loss of protein rigidity upon protein unfolding p|, and is also consistent with molecular dy- 
namics studies suggesting that vibrations can be diverse for different protein conformational 



substates 
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As demonstrated by the fit in Fig. [fl F v is well-modeled using the function 

F v (e a ) = k 2 e a + (k 2 - k x )k B T e 

x In (e~ ec/kBTB + e - ea/kBTe ) . 



(4) 



Equation (0J is essentially a piecewise-linear function with slope k\ for e a < e c , and slope 
k 2 for e a > e c . For GB1, k x = -0.40, k 2 = -0.91, and e c = 88Ak B T e . 

Inherent structure theory 0, ^ assumes that F v (a,T) = F v (e a ,T) (validated for the 
present application in Fig. [1]), and relates the vibrational free energy F v (e a ) and the proba- 
bility distribution Pis{e a ) to the density of states Qis( e a) through 



Pis(e a ,T) = ^n IS (e a )e-^ k * T e- F ^ T ^ T . 



(5) 
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FIG. 2: Density of inherent structures fi/g (e a ). The knee at e r = ATAksTg is due to a change in 
stress, and the plateau beginning at roughly e c = 88AksTg is due to a change in rigidity; both are 
understood in terms of the network of native contacts (Figs. [3l 0]). 



Given Q IS (e ) = 1, F v (e ,T) = 0, and e = 0, £l IS {e a ) is given by 



n IS (e a ) = e ^ B T e F^T )/kB T Pis(e a ,T) (6) 



which generalizes a similar equation in Nakagawa & Peyrard [12J to values of F v (e a , T) that 
vary with e a . 

We used Eq. fl6]) along with the calculated P/s(e a ) and F„(e a ) from Eq. (0J to model the 
density of inherent structures fi/5(e a ). At energies below e c , 0/s(e a ) exhibits an exponential 
increase, but with a slight increase in the exponent factor above an energy e r , giving rise to a 
knee in the plot of log Qjs(e a ) vs. e a (Fig. [2]). The knee is located at a minimum in Pis(e a ) 
between the extended and collapsed states, and is thus associated with the transition state. 
Such a knee was also seen in a previous model of flisi^a) that did not consider vibrations 12]. 
Above e c , Q IS (e a ) plateaus and decreases at the highest energies, which is a consequence of 
the structure in both Pis(e a ) an d F v (e a ,Tg). Rather than being exponential in form [121 ]. 
from Eqs. [4] and [6], Qis{e a ) in this region has the form 

n IS (e a ) = e (W*BT e (b-b)e./W Pis(e a , T) (?) 

Pis{eo, T) 

Because k 2 is close to -1 at Tg, the structure of fi/s(e a ) closely resembles that of P(e a ,Tg) 
above e c . 
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FIG. 3: Potential energy e a vs. number of broken contacts n a . The dependence is well-modeled 
by Eq. ([8]) (red line). The energy required to break a native contact is approximately equal to the 
binding energy e, with differences due to stress in the structure. The knee corresponds to a change 
in stress at n r = 60, where e a {n r ) = 59. QksTg. 

We found (Fig. [3]) that e a is closely related to the number of broken native contacts, n a 
through the piecewise-linear function 



The slopes hi and h% correspond to the amount of energy required to break a native contact 
below [h\) and above (h 2 ) a critical number of broken contacts n r . Data for GB1 are well- 
modeled using hi = 0.997, h 2 = 0.622, and n r = 60. Below n r , breaking a native contact 
requires more potential energy than above n r . Therefore, n r is associated with a change in 
protein stress. 

There are interesting connections between the structure of Clis(e a ) below e c (Fig. [2]) and 
the dependence of e a on n a (Fig. [3]). The change in the slope of flis at e r is closely related 
to the change in the slope of e a (n a ) at n r , suggesting that fljs has a simple exponential 
dependence on n r below e c . However, the knee in Qis occurs at e r = 47AkBTg, which is 
smaller than the value e a {n r ) = 59.6/cB^e at the knee in Fig.[3j While the density of inherent 
structures might truly be enhanced in the gap between e r and e a (n r ), we note that the shift 
of e r with respect to e a (n r ) might indicate that the inherent structure basins associated with 
the transition state are especially large (as noted above, e r is associated with the transition 
state), and that the harmonic approximation might be especially ill-suited to estimating 




(8) 



7 



60 



40 - / 

20 - 

40 80 120 160 200 

n 

a 

FIG. 4: Number of residues m a for which all native contacts are broken vs. the number of broken 
contacts n a . The dependence is well- modeled by Eq. ([9]) (black line). Note that m a = 55 (close to 
the expected value of 56) at n a = 207. 

their free energies for use in Eq. ([6]). 

The source of the plateau in flis{e a ) above e c may be understood in terms of the de- 
pendence of the free energy on both n a and the number of residues m a for which all native 
contacts are broken. As shown in Fig. HJ a plot of m a vs. n a is well-modeled by the function 

m a = i (e^ 40 - l) , (9) 

supporting an expectation that breaking a contact is only likely to create a residue with no 
native contacts at high n a . The following simple model for F v then successfully captures 
the structure of F v in Fig. [TJ 

F v {n a ) = vn a + fim a , (10) 

with m a given by Eq. (|T0|) . Using v = —0.32 and fi = —1.07 yields good agreement between 
values of F v obtained either directly from the Hessian or using Eq. ffTUl) . with a correlation 
coefficient of 0.993 for values calculated from all inherent structures. We conclude that the 
change in the slope of F v vs. e a at e c , and therefore the plateau in flis(e a ) above e c , is 
associated with an increase in the likelihood that breaking a native contact will increase the 
number of residues with no native contacts. 

We found that protein stress and rigidity are closely tied to the network of native contacts 
through Eqs. [8] and [TO] . This finding is remeniscent of an analysis of protein folding by 
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Rader et al. [7|, in which the loss of network rigidity was associated with protein unfolding. 
It is therefore tempting to associate the region between e r and e c in Fig. [2] with the region 
of the mean coordination number (r) where Rader et al. found that proteins become floppy 
and unfold. However, the present approach differs from that used by Rader et al. in two 
key ways. First, because all residue interactions in the present study are lumped into C a 
atoms, the coordination numbers are higher, and the relation of coordination numbers to 
protein rigidity might be different than for the all-atom models considered by Rader et al. 
Second, whereas the present results were obtained using a dynamical model, those obtained 
by Rader et al. were obtained using a static model of the protein. It will be interesting to 
further explore connections between the analyses based on IS theory and network rigidity; 
at present, they provide complementary perspectives on the relationship between protein 
dynamics and protein stress and rigidity. 

The maximum in the density of states above e c is a consequence of considering diversity 
in vibrations, and is not observed when uniform vibrations are assumed [12]. Interestingly, 
a similar structure for the density of states, in which an exponential increase is followed by a 
maximum, has been observed for many structural-glass-forming liquids [9] . It will therefore 
be interesting to improve the estimation of the density of states by obtaining more accurate 
estimates of F v than are possible using a harmonic approximation jll| . 

Studies of two other Go models of proteins yielded results that are similar to those found 
here for GB1 (unpublished results), suggesting the possibility that a simple phenomenolog- 
ical relationship between the network of native contacts and the energy landscape might 
exist for all Go models. It will be interesting to explore this relationship for a large number 
of proteins and seek representations in which it is identical for different proteins. Discovery 
of such "universality" would enable the prediction of important properties of the energy 
landscapes of Go models without performing numerical simulations. 

It will be important to extend the present results to models whose energy landscapes 
exhibit more frustration than Go models. For example, consider a modified model in which 
there is a weak attractive interaction for non-native contacts. In contrast to the simple 
relation illustrated in Fig. [3j in such a model, inherent structures with the same potential 
energy would likely have diverse numbers of native contacts. However, by extending the 
parameter space, the energy still might be simply related to a combination of both the 
number of native contacts and the number of non-native contacts. Similarly, the vibrational 
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free energies might exhibit diversity among inherent structures with the same energy, but 
might still be simply related to both the number of native contacts and non-native contacts 
through an equation analogous to Eq. (TlQi) . Ultimately, it will be interesting to incrementally 
increase the complexity of the model, extending the present results (as far as computationally 
feasible) to realistic, all-atom models of proteins that include explicit solvent and other effects 
that are important in controlling protein function. Use of such all-atom models will enable 
the link between analyses based on IS theory and network rigidity to be further explored. 

The present results demonstrate that simple connections to protein structure are hidden 
within the energy landscape of a Go model. The potential energies and vibrational free 
energies of inherent structures are highly correlated, and both reflect simple measures of 
networks of native contacts. Through use of IS theory, these regularities should enable 
significant simplification of thermodynamic models of proteins jsl. M, \l2\. 

We gratefully acknowledge Arthur Voter and Donald Jacobs for discussions. This work 
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